Overview

Dataset statistics

Number of variables12
Number of observations8190
Missing cells24040
Missing cells (%)24.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory148.0 B

Variable types

NUM10
BOOL1
CAT1

Reproduction

Analysis started2020-07-30 22:35:58.290523
Analysis finished2020-07-30 22:36:17.060131
Duration18.77 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Date has a high cardinality: 182 distinct values High cardinality
MarkDown1 has 4158 (50.8%) missing values Missing
MarkDown2 has 5269 (64.3%) missing values Missing
MarkDown3 has 4577 (55.9%) missing values Missing
MarkDown4 has 4726 (57.7%) missing values Missing
MarkDown5 has 4140 (50.5%) missing values Missing
CPI has 585 (7.1%) missing values Missing
Unemployment has 585 (7.1%) missing values Missing
MarkDown5 is highly skewed (γ1 = 50.2778242) Skewed
Date is uniformly distributed Uniform

Variables

Store
Real number (ℝ≥0)

Distinct count45
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.0
Minimum1
Maximum45
Zeros0
Zeros (%)0.0%
Memory size64.1 KiB

Quantile statistics

Minimum1
5-th percentile3
Q112
median23
Q334
95-th percentile43
Maximum45
Range44
Interquartile range (IQR)22

Descriptive statistics

Standard deviation12.9879661
Coefficient of variation (CV)0.5646941782
Kurtosis-1.201186459
Mean23
Median Absolute Deviation (MAD)11
Skewness0
Sum188370
Variance168.6872634
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
43 182 2.2%
 
41 182 2.2%
 
33 182 2.2%
 
29 182 2.2%
 
25 182 2.2%
 
21 182 2.2%
 
17 182 2.2%
 
13 182 2.2%
 
9 182 2.2%
 
5 182 2.2%
 
Other values (35) 6370 77.8%
 
ValueCountFrequency (%) 
1 182 2.2%
 
2 182 2.2%
 
3 182 2.2%
 
4 182 2.2%
 
5 182 2.2%
 
ValueCountFrequency (%) 
45 182 2.2%
 
44 182 2.2%
 
43 182 2.2%
 
42 182 2.2%
 
41 182 2.2%
 

Date
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count182
Unique (%)2.2%
Missing0
Missing (%)0.0%
Memory size64.1 KiB
28/01/2011
 
45
19/11/2010
 
45
04/06/2010
 
45
15/07/2011
 
45
18/05/2012
 
45
Other values (177)
7965
ValueCountFrequency (%) 
28/01/2011 45 0.5%
 
19/11/2010 45 0.5%
 
04/06/2010 45 0.5%
 
15/07/2011 45 0.5%
 
18/05/2012 45 0.5%
 
26/04/2013 45 0.5%
 
05/02/2010 45 0.5%
 
20/08/2010 45 0.5%
 
26/11/2010 45 0.5%
 
31/12/2010 45 0.5%
 
Other values (172) 7740 94.5%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Other_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

Temperature
Real number (ℝ)

Distinct count4178
Unique (%)51.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean59.356197802197805
Minimum-7.29
Maximum101.95
Zeros0
Zeros (%)0.0%
Memory size64.1 KiB

Quantile statistics

Minimum-7.29
5-th percentile26.849
Q145.9025
median60.71
Q373.88
95-th percentile87.131
Maximum101.95
Range109.24
Interquartile range (IQR)27.9775

Descriptive statistics

Standard deviation18.67860685
Coefficient of variation (CV)0.3146867141
Kurtosis-0.6108838043
Mean59.3561978
Median Absolute Deviation (MAD)13.995
Skewness-0.2833843522
Sum486127.26
Variance348.8903538
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
50.43 11 0.1%
 
70.28 11 0.1%
 
67.87 10 0.1%
 
76.67 9 0.1%
 
72.62 9 0.1%
 
70.87 9 0.1%
 
76.03 9 0.1%
 
53.59 8 0.1%
 
50.81 8 0.1%
 
40.65 8 0.1%
 
Other values (4168) 8098 98.9%
 
ValueCountFrequency (%) 
-7.29 1 < 0.1%
 
-6.61 1 < 0.1%
 
-6.08 1 < 0.1%
 
-2.06 1 < 0.1%
 
0.25 1 < 0.1%
 
ValueCountFrequency (%) 
101.95 3 < 0.1%
 
100.14 1 < 0.1%
 
100.07 1 < 0.1%
 
99.66 2 < 0.1%
 
99.22 3 < 0.1%
 

Fuel_Price
Real number (ℝ≥0)

Distinct count1011
Unique (%)12.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.405991819291819
Minimum2.472
Maximum4.468
Zeros0
Zeros (%)0.0%
Memory size64.1 KiB

Quantile statistics

Minimum2.472
5-th percentile2.669
Q13.041
median3.513
Q33.743
95-th percentile4.021
Maximum4.468
Range1.996
Interquartile range (IQR)0.702

Descriptive statistics

Standard deviation0.4313365711
Coefficient of variation (CV)0.1266405188
Kurtosis-0.9523876532
Mean3.405991819
Median Absolute Deviation (MAD)0.298
Skewness-0.3050626486
Sum27895.073
Variance0.1860512376
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.417 43 0.5%
 
3.638 43 0.5%
 
3.63 40 0.5%
 
3.583 39 0.5%
 
3.62 37 0.5%
 
3.622 31 0.4%
 
3.524 31 0.4%
 
3.227 30 0.4%
 
3.611 30 0.4%
 
3.666 30 0.4%
 
Other values (1001) 7836 95.7%
 
ValueCountFrequency (%) 
2.472 1 < 0.1%
 
2.513 1 < 0.1%
 
2.514 14 0.2%
 
2.52 1 < 0.1%
 
2.533 1 < 0.1%
 
ValueCountFrequency (%) 
4.468 6 0.1%
 
4.449 6 0.1%
 
4.308 3 < 0.1%
 
4.301 6 0.1%
 
4.294 6 0.1%
 

MarkDown1
Real number (ℝ)

MISSING
Distinct count4023
Unique (%)99.8%
Missing4158
Missing (%)50.8%
Infinite0
Infinite (%)0.0%
Mean7032.371785714286
Minimum-2781.45
Maximum103184.98
Zeros0
Zeros (%)0.0%
Memory size64.1 KiB

Quantile statistics

Minimum-2781.45
5-th percentile109.416
Q11577.5325
median4743.58
Q38923.31
95-th percentile21500.9325
Maximum103184.98
Range105966.43
Interquartile range (IQR)7345.7775

Descriptive statistics

Standard deviation9262.747448
Coefficient of variation (CV)1.317158383
Kurtosis23.68716731
Mean7032.371786
Median Absolute Deviation (MAD)3569.965
Skewness4.016436305
Sum28354523.04
Variance85798490.28
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
150.46 2 < 0.1%
 
17.01 2 < 0.1%
 
2920.43 2 < 0.1%
 
460.73 2 < 0.1%
 
6510.79 2 < 0.1%
 
4855.31 2 < 0.1%
 
175.64 2 < 0.1%
 
1.5 2 < 0.1%
 
8.62 2 < 0.1%
 
8940.48 1 < 0.1%
 
Other values (4013) 4013 49.0%
 
(Missing) 4158 50.8%
 
ValueCountFrequency (%) 
-2781.45 1 < 0.1%
 
-772.21 1 < 0.1%
 
-563.9 1 < 0.1%
 
-16.93 1 < 0.1%
 
0.27 1 < 0.1%
 
ValueCountFrequency (%) 
103184.98 1 < 0.1%
 
95102.5 1 < 0.1%
 
88750.34 1 < 0.1%
 
88646.76 1 < 0.1%
 
84139.36 1 < 0.1%
 

MarkDown2
Real number (ℝ)

MISSING
Distinct count2715
Unique (%)92.9%
Missing5269
Missing (%)64.3%
Infinite0
Infinite (%)0.0%
Mean3384.1765936323177
Minimum-265.76
Maximum104519.54
Zeros3
Zeros (%)< 0.1%
Memory size64.1 KiB

Quantile statistics

Minimum-265.76
5-th percentile2.98
Q168.88
median364.57
Q32153.35
95-th percentile17261.44
Maximum104519.54
Range104785.3
Interquartile range (IQR)2084.47

Descriptive statistics

Standard deviation8793.583016
Coefficient of variation (CV)2.598440942
Kurtosis32.34218663
Mean3384.176594
Median Absolute Deviation (MAD)355.37
Skewness4.962258122
Sum9885179.83
Variance77327102.25
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3 11 0.1%
 
1.5 10 0.1%
 
0.5 9 0.1%
 
4 9 0.1%
 
0.03 8 0.1%
 
1.91 8 0.1%
 
6 7 0.1%
 
9 5 0.1%
 
3.82 5 0.1%
 
5.73 5 0.1%
 
Other values (2705) 2844 34.7%
 
(Missing) 5269 64.3%
 
ValueCountFrequency (%) 
-265.76 1 < 0.1%
 
-192 1 < 0.1%
 
-35.74 1 < 0.1%
 
-20 1 < 0.1%
 
-15.45 1 < 0.1%
 
ValueCountFrequency (%) 
104519.54 1 < 0.1%
 
97740.99 1 < 0.1%
 
92523.94 1 < 0.1%
 
89121.94 1 < 0.1%
 
82881.16 1 < 0.1%
 

MarkDown3
Real number (ℝ)

MISSING
Distinct count2885
Unique (%)79.9%
Missing4577
Missing (%)55.9%
Infinite0
Infinite (%)0.0%
Mean1760.1001799058954
Minimum-179.26
Maximum149483.31
Zeros1
Zeros (%)< 0.1%
Memory size64.1 KiB

Quantile statistics

Minimum-179.26
5-th percentile0.782
Q16.6
median36.26
Q3163.15
95-th percentile1159.758
Maximum149483.31
Range149662.57
Interquartile range (IQR)156.55

Descriptive statistics

Standard deviation11276.46221
Coefficient of variation (CV)6.406716127
Kurtosis72.06807509
Mean1760.10018
Median Absolute Deviation (MAD)34.16
Skewness8.133805548
Sum6359241.95
Variance127158599.9
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 17 0.2%
 
3 15 0.2%
 
2 15 0.2%
 
6 14 0.2%
 
0.6 12 0.1%
 
4 11 0.1%
 
1.2 10 0.1%
 
0.24 9 0.1%
 
0.5 9 0.1%
 
0.3 9 0.1%
 
Other values (2875) 3492 42.6%
 
(Missing) 4577 55.9%
 
ValueCountFrequency (%) 
-179.26 1 < 0.1%
 
-89.1 1 < 0.1%
 
-44.54 1 < 0.1%
 
-29.1 1 < 0.1%
 
-23.97 1 < 0.1%
 
ValueCountFrequency (%) 
149483.31 1 < 0.1%
 
146394.44 1 < 0.1%
 
141630.61 1 < 0.1%
 
139621.51 1 < 0.1%
 
130129.11 1 < 0.1%
 

MarkDown4
Real number (ℝ≥0)

MISSING
Distinct count3405
Unique (%)98.3%
Missing4726
Missing (%)57.7%
Infinite0
Infinite (%)0.0%
Mean3292.9358862586605
Minimum0.22
Maximum67474.85
Zeros0
Zeros (%)0.0%
Memory size64.1 KiB

Quantile statistics

Minimum0.22
5-th percentile18.4695
Q1304.6875
median1176.425
Q33310.0075
95-th percentile12863.771
Maximum67474.85
Range67474.63
Interquartile range (IQR)3005.32

Descriptive statistics

Standard deviation6792.329861
Coefficient of variation (CV)2.06269727
Kurtosis29.00029382
Mean3292.935886
Median Absolute Deviation (MAD)1070.015
Skewness4.864484796
Sum11406729.91
Variance46135744.95
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3 5 0.1%
 
2 4 < 0.1%
 
2.5 4 < 0.1%
 
4 4 < 0.1%
 
9 4 < 0.1%
 
2.61 4 < 0.1%
 
3.97 3 < 0.1%
 
8 3 < 0.1%
 
0.63 3 < 0.1%
 
12 3 < 0.1%
 
Other values (3395) 3427 41.8%
 
(Missing) 4726 57.7%
 
ValueCountFrequency (%) 
0.22 2 < 0.1%
 
0.41 1 < 0.1%
 
0.46 1 < 0.1%
 
0.63 3 < 0.1%
 
0.66 1 < 0.1%
 
ValueCountFrequency (%) 
67474.85 1 < 0.1%
 
65344.64 1 < 0.1%
 
63830.91 1 < 0.1%
 
63130.81 1 < 0.1%
 
60065.82 1 < 0.1%
 

MarkDown5
Real number (ℝ)

MISSING
SKEWED
Distinct count4045
Unique (%)99.9%
Missing4140
Missing (%)50.5%
Infinite0
Infinite (%)0.0%
Mean4132.216422222222
Minimum-185.17
Maximum771448.1
Zeros0
Zeros (%)0.0%
Memory size64.1 KiB

Quantile statistics

Minimum-185.17
5-th percentile577.679
Q11440.8275
median2727.135
Q34832.555
95-th percentile10227.8585
Maximum771448.1
Range771633.27
Interquartile range (IQR)3391.7275

Descriptive statistics

Standard deviation13086.69028
Coefficient of variation (CV)3.16699053
Kurtosis2923.05653
Mean4132.216422
Median Absolute Deviation (MAD)1482.82
Skewness50.2778242
Sum16735476.51
Variance171261462.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3113.78 2 < 0.1%
 
986.23 2 < 0.1%
 
1327.97 2 < 0.1%
 
2743.18 2 < 0.1%
 
1064.56 2 < 0.1%
 
2248.72 1 < 0.1%
 
1044.74 1 < 0.1%
 
3154.77 1 < 0.1%
 
1756.07 1 < 0.1%
 
6207.39 1 < 0.1%
 
Other values (4035) 4035 49.3%
 
(Missing) 4140 50.5%
 
ValueCountFrequency (%) 
-185.17 1 < 0.1%
 
-37.02 1 < 0.1%
 
40.98 1 < 0.1%
 
60.92 1 < 0.1%
 
114.25 1 < 0.1%
 
ValueCountFrequency (%) 
771448.1 1 < 0.1%
 
108519.28 1 < 0.1%
 
105223.11 1 < 0.1%
 
85851.87 1 < 0.1%
 
63005.58 1 < 0.1%
 

CPI
Real number (ℝ≥0)

MISSING
Distinct count2505
Unique (%)32.9%
Missing585
Missing (%)7.1%
Infinite0
Infinite (%)0.0%
Mean172.46080918276135
Minimum126.064
Maximum228.9764563
Zeros0
Zeros (%)0.0%
Memory size64.1 KiB

Quantile statistics

Minimum126.064
5-th percentile126.5621
Q1132.3648387
median182.7640032
Q3213.9324122
95-th percentile223.8693849
Maximum228.9764563
Range102.9124563
Interquartile range (IQR)81.5675735

Descriptive statistics

Standard deviation39.7383461
Coefficient of variation (CV)0.2304195735
Kurtosis-1.832113304
Mean172.4608092
Median Absolute Deviation (MAD)42.0385282
Skewness0.06766805636
Sum1311564.454
Variance1579.136151
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
132.7160968 33 0.4%
 
139.1226129 24 0.3%
 
201.0705712 12 0.1%
 
224.8025314 12 0.1%
 
130.683 11 0.1%
 
129.7706452 11 0.1%
 
132.4668065 11 0.1%
 
130.737871 11 0.1%
 
126.2085484 11 0.1%
 
129.8364 11 0.1%
 
Other values (2495) 7458 91.1%
 
(Missing) 585 7.1%
 
ValueCountFrequency (%) 
126.064 11 0.1%
 
126.0766452 11 0.1%
 
126.0854516 11 0.1%
 
126.0892903 11 0.1%
 
126.1019355 11 0.1%
 
ValueCountFrequency (%) 
228.9764563 3 < 0.1%
 
228.8892482 1 < 0.1%
 
228.8020401 1 < 0.1%
 
228.7796682 3 < 0.1%
 
228.7298638 6 0.1%
 

Unemployment
Real number (ℝ≥0)

MISSING
Distinct count404
Unique (%)5.3%
Missing585
Missing (%)7.1%
Infinite0
Infinite (%)0.0%
Mean7.8268210387902695
Minimum3.6839999999999997
Maximum14.312999999999999
Zeros0
Zeros (%)0.0%
Memory size64.1 KiB

Quantile statistics

Minimum3.684
5-th percentile5.143
Q16.634
median7.806
Q38.567
95-th percentile10.926
Maximum14.313
Range10.629
Interquartile range (IQR)1.933

Descriptive statistics

Standard deviation1.877258594
Coefficient of variation (CV)0.2398494337
Kurtosis2.498221012
Mean7.826821039
Median Absolute Deviation (MAD)0.915
Skewness1.067685459
Sum59522.974
Variance3.524099828
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8.099 78 1.0%
 
7.852 56 0.7%
 
8.163 56 0.7%
 
8.625 54 0.7%
 
7.057 52 0.6%
 
7.441 52 0.6%
 
6.565 52 0.6%
 
7.931 52 0.6%
 
8.2 52 0.6%
 
6.891 52 0.6%
 
Other values (394) 7049 86.1%
 
(Missing) 585 7.1%
 
ValueCountFrequency (%) 
3.684 8 0.1%
 
3.879 13 0.2%
 
3.896 4 < 0.1%
 
3.921 13 0.2%
 
3.932 26 0.3%
 
ValueCountFrequency (%) 
14.313 42 0.5%
 
14.18 39 0.5%
 
14.099 39 0.5%
 
14.021 36 0.4%
 
13.975 24 0.3%
 

IsHoliday
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
False
7605
True
 
585
ValueCountFrequency (%) 
False 7605 92.9%
 
True 585 7.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

StoreDateTemperatureFuel_PriceMarkDown1MarkDown2MarkDown3MarkDown4MarkDown5CPIUnemploymentIsHoliday
0105/02/201042.312.572NaNNaNNaNNaNNaN211.0963588.106False
1112/02/201038.512.548NaNNaNNaNNaNNaN211.2421708.106True
2119/02/201039.932.514NaNNaNNaNNaNNaN211.2891438.106False
3126/02/201046.632.561NaNNaNNaNNaNNaN211.3196438.106False
4105/03/201046.502.625NaNNaNNaNNaNNaN211.3501438.106False
5112/03/201057.792.667NaNNaNNaNNaNNaN211.3806438.106False
6119/03/201054.582.720NaNNaNNaNNaNNaN211.2156358.106False
7126/03/201051.452.732NaNNaNNaNNaNNaN211.0180428.106False
8102/04/201062.272.719NaNNaNNaNNaNNaN210.8204507.808False
9109/04/201065.862.770NaNNaNNaNNaNNaN210.6228577.808False

Last rows

StoreDateTemperatureFuel_PriceMarkDown1MarkDown2MarkDown3MarkDown4MarkDown5CPIUnemploymentIsHoliday
81804524/05/201367.113.6273249.34481.8258.481183.231309.30NaNNaNFalse
81814531/05/201365.883.6466474.49411.3877.069.384227.27NaNNaNFalse
81824507/06/201370.713.6339977.82744.2980.004825.713597.34NaNNaNFalse
81834514/06/201370.013.6322471.44517.87348.542612.333459.39NaNNaNFalse
81844521/06/201370.133.6264989.34385.31178.562463.423117.94NaNNaNFalse
81854528/06/201376.053.6394842.29975.033.002449.973169.69NaNNaNFalse
81864505/07/201377.503.6149090.482268.58582.745797.471514.93NaNNaNFalse
81874512/07/201379.373.6143789.941827.3185.72744.842150.36NaNNaNFalse
81884519/07/201382.843.7372961.491047.07204.19363.001059.46NaNNaNFalse
81894526/07/201376.063.804212.02851.732.0610.881864.57NaNNaNFalse